Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Oracle IMDSv2 API #528

Merged
merged 7 commits into from
Aug 13, 2020
Merged

Conversation

TheRealFalcon
Copy link
Member

  • v2 of the API is now default with fallback to v1.
  • Refactored the Oracle datasource to fetch version, instance, and vnic metadata simultaneously.

@@ -171,10 +105,11 @@ class DataSourceOracle(sources.DataSource):
sources.NetworkConfigSource.system_cfg,
)

_network_config = sources.UNSET
_network_config = sources.UNSET # type: dict
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't sure how to declare these types. If I don't declare anything, my type checking complains anywhere I use _network_config as a dict because sources.UNSET is a string. I still get a complaint on this line because I'm declaring a string as a dict, but it has quieted the rest of the warnings. If we were being strict it would be a Union[dict|str], but since we don't ever actually want to use it as a string, that also seemed wrong to me (and I didn't want to add the union import just for that).

Shrug. Something to think about if we start to incorporate more typing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think sources.UNSET is going to be a barrier to typing the codebase more generally, so I think we should set it to one side here. I'd probably leave this untyped, rather than applying a not-quite-correct annotation; we'll correct it as part of a broader change.

data = read_opc_metadata()
fetch_vnics_data = self.ds_cfg.get('configure_secondary_nics', False)
# suppress() in this context is just a no-op context manager
network_context = suppress() if _is_iscsi_root(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love using suppress here, but duplicating the read_opc_metadata() call (especially if it's taking arguments now) seems worse to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed; elsewhere in the codebase (albeit in testing code) we use ExitStack for this, and we import it as something that indicates how we're using it:

from contextlib import ExitStack as does_not_raise

What do you think to following that pattern here? (Perhaps as noop is more apropos here? Or maybe nullcontext?)

* v2 of the API is now default with fallback to v1.
* Refactored the Oracle datasource to fetch version, instance, and vnic metadata simultaneously.
Copy link
Collaborator

@OddBloke OddBloke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, James! This is a complex change, and you've done a great job with it. Comments are inline.

data = read_opc_metadata()
fetch_vnics_data = self.ds_cfg.get('configure_secondary_nics', False)
# suppress() in this context is just a no-op context manager
network_context = suppress() if _is_iscsi_root(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed; elsewhere in the codebase (albeit in testing code) we use ExitStack for this, and we import it as something that indicates how we're using it:

from contextlib import ExitStack as does_not_raise

What do you think to following that pattern here? (Perhaps as noop is more apropos here? Or maybe nullcontext?)

cloudinit/sources/tests/test_oracle.py Show resolved Hide resolved
@@ -171,10 +105,11 @@ class DataSourceOracle(sources.DataSource):
sources.NetworkConfigSource.system_cfg,
)

_network_config = sources.UNSET
_network_config = sources.UNSET # type: dict
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think sources.UNSET is going to be a barrier to typing the codebase more generally, so I think we should set it to one side here. I'd probably leave this untyped, rather than applying a not-quite-correct annotation; we'll correct it as part of a broader change.

@@ -247,13 +185,16 @@ def network_config(self):
self._network_config = self.distro.generate_fallback_config()

if self.ds_cfg.get('configure_secondary_nics'):
if self._vnics_data == sources.UNSET:
LOG.warning("Network config is UNSET but should not be")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Network config" is a little ambiguous in this context; I had to read this a couple of times to see that this was testing self._vnics_data, not self._network_config.

@@ -247,13 +185,16 @@ def network_config(self):
self._network_config = self.distro.generate_fallback_config()

if self.ds_cfg.get('configure_secondary_nics'):
if self._vnics_data == sources.UNSET:
LOG.warning("Network config is UNSET but should not be")
return None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure return None is the right behaviour here: _add_network_config_from_opc_imds only adds to self._network_config, so we do have some valid config already determined at this point. If we return None from .network_config then _find_networking_config in stages.py will move onto considering other sources of network config.

For Oracle, the order in which network config sources will be considered is:

network_config_sources = (
sources.NetworkConfigSource.cmdline,
sources.NetworkConfigSource.ds,
sources.NetworkConfigSource.initramfs,
sources.NetworkConfigSource.system_cfg,
)

So if we return None here: if there is initramfs configuration then that will be used, if not then fallback config will be generated (the system_cfg source is used rarely). That logic is identical to the logic here, but excludes the addition of secondary NICs (which is fine; we know there aren't any if we're executing this line) and the call to _ensure_netfailover_safe. The message on the commit that introduced _ensure_netfailover_safe (fa47d52) indicates that this should be called for fallback network configuration, so this is less fine.

(TL;DR: I think we need to skip the addition of the secondary NICs here, rather than aborting processing entirely.)

(2, does_not_raise()),
(3, does_not_raise()),
(4, pytest.raises(UrlError)),
],
)
def test_retries(self, expectation, failure_count, httpretty):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't testing retries on the VNIC route. (I suspect a separate test which just takes the happy path for instance data so it can test VNIC fetching in isolation would be easiest.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having reviewed the whole PR now: I think we're missing fetch_vnics_data testing more generally: all the tests still pass with this patch applied:

--- a/cloudinit/sources/DataSourceOracle.py
+++ b/cloudinit/sources/DataSourceOracle.py
@@ -305,6 +305,7 @@ def read_opc_metadata(*, fetch_vnics_data: bool = False):
         )._response.json()
     vnics_data = None
     if fetch_vnics_data:
+        assert False
         try:
             vnics_data = readurl(
                 METADATA_PATTERN.format(

cloudinit/sources/tests/test_oracle.py Show resolved Hide resolved
else:
with dhcp.EphemeralDHCPv4(net.find_fallback_nic()):
data = read_opc_metadata()
fetch_vnics_data = self.ds_cfg.get('configure_secondary_nics', False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now perform the defaulting of configure_secondary_nics in two places: here and in .network_config. It would be preferable to have the default configurable in a single place for ease of maintenance.

Copy link
Member Author

@TheRealFalcon TheRealFalcon Aug 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm don't understand what you mean by 'defaulting of configure_secondary_nics'. Can you explain it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line determines what the default value of configure_secondary_nics will be if it isn't present in self.ds_cfg: False. In .network_config, we also determine what the default value of configure_secondary_nics will be if it isn't present (and in that case as we don't supply a second argument to .get, it will be None).

Though, further confusing matters, we actually set the default in BUILTIN_DS_CONFIG which should mean that configure_secondary_nics is always set in self.ds_cfg (so we could use regular dict access instead of .get). However, that goes through util.mergemanydict and I don't know for sure that that couldn't result in a dict without the key present under certain user inputs.

Does that make any more sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, mainly because of this

so we could use regular dict access instead of .get

I guess I don't see the problem with using .get here (or anywhere and everything 😉 ). Since we're not persisting the result of this anywhere, I don't see it as "defaulting a value", but rather eliminating the need for None checks in case something hasn't been set. Why would having things as they are increase the maintenance burden?

Comment on lines 210 to 213
values, as it should continue to be configured for DHCP. As such, this
takes an existing network_config dict which is expected to have the
primary NIC configuration already present.
It will mutate the given dict to include the secondary VNICs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still written as if this callable takes a parameter.

# read_opc_metadata should JSON decode the response and return it
expected = json.loads(OPC_V2_METADATA)
assert expected == oracle.read_opc_metadata().instance_data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also be testing the version attribute of read_opc_metadata's return here (and in the test below)?

@OddBloke
Copy link
Collaborator

OddBloke commented Aug 12, 2020 via email

def _mock_v2_urls(httpretty):
def instance_callback(request, uri, response_headers):
assert request.headers.get("Authorization") == "Bearer Oracle"
return [200, {}, OPC_V2_METADATA]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the package build in xenial is failing during the tests: https://travis-ci.com/github/canonical/cloud-init/jobs/371325182

I believe this is because the version of httpretty (0.8.6) in xenial does not include gabrielfalcao/HTTPretty@bdb7ee7 (which was first included in 0.8.11). So I think you'll need to add some default response header values to avoid these tracebacks.

(This isn't caught by the "xenial" unittests in Travis because we don't have HTTPretty pinned correctly; I'll be submitting a PR to address that shortly.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#531 is that PR.

@TheRealFalcon
Copy link
Member Author

@OddBloke test fixed

Copy link
Collaborator

@OddBloke OddBloke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the speedy turnaround, James! Just two remaining (minor) areas of concern.

Have you performed any testing of this in Oracle yet? (Do you have access?)

Comment on lines 134 to 135
network_context = noop() if _is_iscsi_root(
) else dhcp.EphemeralDHCPv4(net.find_fallback_nic())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't love this continuation; black would format this as:

network_context = (
    noop()
    if _is_iscsi_root()
    else dhcp.EphemeralDHCPv4(net.find_fallback_nic())
)

which is actually more lines than:

if _is_iscsi_root():
    network_context = noop()
else:
    network_context = dhcp.EphemeralDHCPv4(net.find_fallback_nic())

and you can even golf that down a little further if you want:

network_context = noop()
if not _is_iscsi_root():
    network_context = dhcp.EphemeralDHCPv4(net.find_fallback_nic())

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(And before you say anything:

In [2]: len("        network_context = noop() if _is_iscsi_root() else dhcp.EphemeralDHCPv4(net.find_fallback_nic())") > 100
Out[2]: True

😁 )

@@ -175,6 +109,7 @@ class DataSourceOracle(sources.DataSource):

def __init__(self, sys_cfg, *args, **kwargs):
super(DataSourceOracle, self).__init__(sys_cfg, *args, **kwargs)
self._vnics_data = sources.UNSET
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I'm not being distracted by the type annotation, I've realised that I have this question: is there any reason to use sources.UNSET over None here? The use of this attribute is entirely contained to this class, so I don't think we need to follow that (IMO, anti-)pattern.

Comment on lines 193 to 195
if self._vnics_data == sources.UNSET:
LOG.warning(
"Secondary NIC data is UNSET but should not be")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think to moving this guard into _add_network_config_from_opc_imds? As currently written, _add_network_config_from_opc_imds will behave confusingly if called before self._vnics_data is set. It seems natural for that method to own its own input validation.

(As things stand, I think it will behave very confusingly: self._vnics_data will be "_unset", so we'll get a TypeError: string indices must be integers when we attempt to effectively do "u"['.macAddr'].lower(); if we do switch to self._vnics_data = None in __init__ then I think we'll get the less-confusing-but-still-avoidable: TypeError: 'NoneType' object is not subscriptable.)

Copy link
Collaborator

@OddBloke OddBloke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this code now, thanks James! I'm going to wait for you to complete the testing you're doing before landing this.

@TheRealFalcon
Copy link
Member Author

@OddBloke , I manually checked the following:

  • cloud-init query -a after installing the new deb and running a clean with reboot returns the new metadata, including "subplatform": "metadata (http://169.254.169.254/opc/v2/)"
  • cloud-init status is done
  • nothing strange in cloud-init.log or cloud-init-output.log
  • main of DataSourceOracle.py returns the expected metadata and v2

If there's nothing else, I'm comfortable with where this is at.

@josh-potter
Copy link

@OddBloke , I manually checked the following:

  • cloud-init query -a after installing the new deb and running a clean with reboot returns the new metadata, including "subplatform": "metadata (http://169.254.169.254/opc/v2/)"
  • cloud-init status is done
  • nothing strange in cloud-init.log or cloud-init-output.log
  • main of DataSourceOracle.py returns the expected metadata and v2

If there's nothing else, I'm comfortable with where this is at.

I can manually kill the v2 endpoint on your test instance if we want to validate failover logic as well. I'll just need the instance id (e.g. ocid1.instance.foo) and a time window to have it disabled.

@TheRealFalcon
Copy link
Member Author

@josh-potter Great, thanks. Instance id is ocid1.instance.oc1.phx.anyhqljtniwq6sycegkrokuyq2w27j4rm43i6zvgzphk5our2mle2j2rszrq

Time frame can be as soon as works for you. I should only need a couple minutes, but if you need a specific time frame, lets say 30 minutes. If you just let me know when you plan on turning it off I can get to it right away.

@josh-potter
Copy link

@TheRealFalcon I just need to document the change on our side before executing. That'll take me a little bit to get together. Let's target 2:00 PM EST. I'll leave the v2 endpoint disabled on that instance for 30 minutes.

@josh-potter
Copy link

@TheRealFalcon the v2 endpoint is now disabled. I'll add it back at 2:30 PM EST or sooner if you get finished up with testing.

@TheRealFalcon
Copy link
Member Author

Thanks @josh-potter . I got what I need

@TheRealFalcon
Copy link
Member Author

I manually checked the same things I mentioned earlier, except checking for v1 instead of v2. Also, I diff'ed the cloud-init query -a and python3 DataSourceOracle.py outputs comparing v2 to v1, and the only differences were in the version strings.

I'm comfortable to merge.

@OddBloke
Copy link
Collaborator

@TheRealFalcon Do you want to author the squash commit message, or shall I come up with something?

@TheRealFalcon
Copy link
Member Author

TheRealFalcon commented Aug 13, 2020

@OddBloke I was thinking the title plus first comment. If you think there's more or less we should add, use your creative liberties 😉

"Support Oracle IMDSv2 API

  • v2 of the API is now default with fallback to v1.
  • Refactored the Oracle datasource to fetch version, instance, and vnic metadata simultaneously."

@OddBloke OddBloke merged commit 546617c into canonical:master Aug 13, 2020
@TheRealFalcon TheRealFalcon deleted the oracle-v2 branch August 13, 2020 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants